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Reducing the complexity of large systems described as complex networks is key to understand 
them and a crucial issue is to know which properties of the initial system are preserved in the 
reduced one. Here we use random walks to design a coarse-graining scheme for complex networks. 
By construction the coarse-graining preserves the slow modes of the walk, while reducing significantly 
the size and the complexity of the network. In this sense our coarse-graining allows to approximate 
large networks by smaller ones, keeping most of their relevant spectral properties. 

PACS numbers: 89.75.Fb, 02.50.-r 



One of the most difficult hurdles in the analysis and vi- 
sualization of large complex networks Jj, i2i iSi l4l under- 
standably, their sheer size. Given that most algorithms 
used to extract information from a network topology run 
in times that grow polynomially with the number N of 
network nodes, even networks of a few thousands nodes 
can represent a challenge, and networks with N > 10^ 
become almost impossible to deal with. 

A promising way around this problem is to coarse-grain 
the network, i.e. to reduce the number of nodes and edges 
by means of a mapping of the network with N nodes and 
E edges into a smaller network with N nodes and E 
edges. N and E have to be small enough to be amenable 
to analysis and visualization. 

Several coarse-graining schemes have been proposed in 
the literature. The fc-core decomposition, which is a node 
decimation approach, was first proposed in Q to isolate 
the central core of a network, and was shown to be ex- 
tremely effective for visualization purposes @. Alterna- 
tively, the number of nodes can be reduced by clumping 
them together in clusters. A widely accepted technique 
is based on community detection [7|. Within this frame- 
work, groups of nodes with more edges pointing to each 
other then to the rest of the network are considered as 
one single unit. After grouping the nodes a much re- 
duced "network of clusters" is obtained, representing the 
functional units of a network. Because of the importance 
and the complexity of finding meaningful clusters, a very 
large number of cluster ing algorithms have been devel- 
oped recently 0, i, [Tfl, [ul [H, [H]. However there 
is often no clear statement on whether properties of the 
initial network are preserved in the network of clusters. 

In this respect, the box-covering technique recently in- 
troduced by Song et al. ^ and further analyzed by Goh 
et al. flBj deserves a special mention: after covering self- 
similar networks with suitably defined boxes of a given 
size, the new networks obtained by substituting each box 
with a node preserve some of the topological features 
of the original ones. Song et al. have thus recognized 
that network reduction should go hand-in-hand with the 
preservation of some relevant network properties, akin to 



the renormalization group in statistical physics. What 
properties should be preserved is instead an unsettled 
issue: although the network topology can provide impor- 
tant clues about the organization of the system under 
scrutiny, it does not necessarily bear any insight on the 
internal dynamics of the system. 

In this Letter we introduce a mathematical framework 
to coarse-grain networks, based on the idea of grouping 
nodes together. Thus our starting point is similar to clus- 
tering approaches. However, contrary to the clustering 
paradigm of identifying the "correct" communities in a 
network, our goal is to obtain a reduced network that 
preserves some properties of the initial one. Here the 
properties of interest will be the main characteristics of 
random walks on networks [l6l |. 

Random walks play a central role in a large number of 
dynamical processes taking place on complex networks. 
Their evolution is described by a stochastic matrix W . 
If A is the adjacency matrix, then Wij = Aij (J^i 
gives the transition probability from j to i. W has sev- 
eral interesting properties. In particular for a connected 
undirected network, the Perron- Frobenius theorem states 
that the largest eigenvalue A"^ is equal to 1. The right 
eigenvector \p^) associated to describes the station- 
ary state and the left eigenvector {u^\ is constant. Typ- 
ically, eigenvectors with eigenvalues close to one capture 
the large-scale behavior of the random walk. Since a 
coarse-grained network necessarily loses the fine details 
of the original one, expectedly our goal is to preserve the 
large scale behavior of the random walk, hence its largest 
eigenvectors and eigenvalues. 

As a starting point, we want to ensure that two nodes 
in an undirected network (say nodes 1 and 2) having 
exactly the same neighbors are grouped together, since 
they cannot be distinguished from the point of view of a 
random walk. In terms of a left eigenvector (u" | of it 
means that u" ~ for any A" ^ since column 1 and 
2 oi W are equal. The obvious coarse-graining step is to 
coalesce such pair of nodes, with the resulting new node 
carrying the sum of the edges of the initial ones. The 
new network in which node 1 and 2 have been merged is 
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characterized by a {N — 1) x {N — 1) adjacency matrix A, 
with the first hne, resp. column, of A being the sum of 
the two first hues, resp. columns, of A. On this reduced 
network, the stochastic matrix W describing a random 
walk is obtained by normalizing the columns of A. 

At this stage, it will be useful to write W as a product 
of three matrices, 

W = RWK. 

K and R are two projection-like operators from the 
A^-dimensional space of the initial nodes to the (A^-1)- 
dimensional space of the new nodes. In order to fulfill 
the definition of W and using that oc Aij for undi- 
rected networks, K and R are defined as : 
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(see [l7| for a similar mathematical framework). The 
interesting features of W come from the property [l^ 
that if Ui — U2, the vector {u^lK is a left eigenvector 
of W with eigenvalue A" (i.e. {u^lK = {u°'\). To obtain 
this result one simply needs to see that {u°'\KR = if 
= M2 ■ ^1^*^ case of undirected networks, the result 
can be extended to the right eigenvectors. Under the 
same hypothesis (m" =^2)1 vector R\p°') is a right 
eigenvector ofW with eigenvalue A" (for a — 1 the result 
holds as well in directed networks). Moreover we could 
show analytically that a perturbative approach can be 
carried out. If |m" — ^2 | oc e for a given A" 0, 
R\p"), resp. A", need to be corrected by vectors, resp. 
a scalar, scaling as e in order to become left and right 
eigenvectors, resp. eigenvalue, of W. 

To summarize we have introduced a mathematical 
framework such that grouping nodes with similar compo- 
nents in has a spectral interpretation: it preserves 
the eigenvalue A", averages the components of and 
for undirected networks sums up the components of 

For simplicity the case where only two components of 
an eigenvector are equal (resp. close to each other) has 
been considered. It is straightforward to generalize the 
grouping to all nodes having the same components (resp. 
components close to each other) in (u" \ . Groups are first 
labeled from 1 to iV and Sc,i is defined as 1 if node i 
belongs to group C, otherwise (C = 1 . . . N). Then K 



and R read i?, 
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and K, 
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tSc i, with R a 



N X N matrix and K & N x N matrix. 

The method can be further extended to more than one 
eigenvector and groups are defined as the nodes having 
the same components (resp. components close to each 



other) over a set of left eigenvectors { (w"' | j^^i . Choosing 
the S first non-trivial eigenvectors ensures to conserve the 
slow modes of the random walk. 

Spectral properties of W have been used previously in 
spectral clustering techniques [l3| . However the proper- 
ties derived above show that there exists a way to pre- 
serve the spectral properties of a network while reducing 
its size, which is the ultimate goal of any coarse-graining 
strategy. 

To illustrate our coarse-graining scheme, we applied it 
to the di-alanine folding network studied in [l9| . consid- 
ered as undirected. The network was built from Molecu- 
lar Dynamics simulation of a di-alanine peptide and con- 
sists of 1832 nodes (Fig.[Tl\). A node in Fig.flJ'V accounts 
for a configuration sampled during the simulation and 
edges represent transitions between configurations [2C|. 
The weight of an edge between two configurations is equal 
to the total number of transitions sampled during the 
simulation. In a previous work [l9| , the di-alanine folding 
network was shown to consist of four main clusters (colors 
in Fig.[T]), corresponding to the four main energy basins 
of the underlying free-energy landscape. Random walks 
on such networks are representative of peptide dynamics 
since the elements of W correspond to the effective tran- 
sition probabilities, as observed along the simulation. 

To coarse-grain the network, we have used the first 
three non-trivial left eigenvectors (u^ | , {u^ \ and (u* | of 
W. Along each eigenvector, / = 60 intervals of equal size 
have been defined between the highest and the lowest 
component. Nodes have been grouped together if they 
belonged to the same interval along the three eigenvec- 
tors. In this way 227 non-empty groups have been found. 
The coarse-grained network is shown in Fig. [T}3. Colors 
were set according to the clusters of the nodes in each 
group. Clearly the coarse-grained network is not equiv- 
alent to the network of clusters. Although the nodes of 
a group do not necessarily belong to the same cluster, 
this situation happened only for 4 groups (representing 
15 nodes) out the 227. We also applied on the coarse- 
grained network the same clustering algorithm [2l| used 
to identify the clusters in Fig.[T]A. Exactly 4 clusters were 
obtained corresponding to more than 98% of the initial 
nodes correctly classified. Thus, even if the aim of our 
coarse-graining approach is different than the usual clus- 
tering, the results are indeed consistent with the global 
features revealed by the cluster structure of the network. 
Moreover the cluster structure is robust under coarse- 
graining. 

As expected from our perturbative derivation, the first 
eigenvalues are preserved in the coarse-grained network 
with high accuracy (Table [T] A columns 2 and 3). More- 
over the normalized scalar product (Table U A columns 
4 and 5) shows that the projected eigenvectors (u^lK 
and are almost equal the corresponding eigenvec- 

tors of W. Similar results have been obtained consider- 
ing the giant component of an Erdos-Renyi network (2^ 



3 





OL 




A" 


{u"^ K\iL'^ ) 






II(«"1A'|M|S"|| 




A 


2 
3 
4 


0.99987 
0.99947 
0.99785 


0.99987 
0.99944 
0.99780 


0.9999 
0.9998 
0.9999 


0.9999 
0.9999 
0.9999 


B 


2 
3 
4 


0.98955 
0.98901 
0.98779 


0.98922 
0.98861 
0.98741 


0.9985 
0.9989 
0.9885 


0.9941 
0.9924 
0.9686 


C 


2 
3 
4 


0.99971 
0.99934 
0.99917 


0.99971 
0.99933 
0.99916 


0.999916 
0.9994 
0.9998 


0.9999 
0.9988 
0.9997 



TABLE I: Columns 2 and 3: the three largest (non-trivial) 
eigenvalues of the stochastic matrices W and W. Column 4: 
Scalar product between {u°'\K and {u°'\ for the three left 
eigenvectors used in the coarse-graining procedure. Column 5: 
Scalar product between R\p") and \p") for the three right 
eigenvectors. Box A: Di-alanine network shown in Fig. [T}\ 
and B. Box B: Erdos-Renyi network. Box C: Barabasi- Albert 
network. 





FIG. 1: (Color online) A: Di-alanine folding network 
(A^=1832). Node size is proportional to their weight (i.e. the 
number of times nodes have been visited in the simulation). 
The four different colors correspond to the clusters found in 
[igl ]. B: Coarse-grained network {N = 227) according to {u"\, 
a = 2, 3, 4. Node size is proportional to the total weights of 
the groups. Colors correspond to the clusters in which the 
nodes of each group have been classified. 



(N = 5626, < fc >= 2, TableUB) and a Barabasi- Albert 
network [23,] {N = 6005, m = 1, Table U C), always 
considering the three first non-trivial left eigenvectors 
and and / — 60. The general agreement 
indicates that our perturbative approach is robust for 
various kinds of networks even if components in (u" | are 
not equal but close to each other within the groups. 

Fig. [T] hints that the global architecture of the coarse- 
grained network is representative of the original one. For 
instance most of the nodes buried in the center of the 
red cluster form one single group, while the nodes lying 
along the few paths connecting the red and violet clus- 
ters, and therefore critical for the network global connec- 
tivity, are well preserved. A more stringent test is done 
by comparing the mean first passage time (MFPT) from 
node j to node i, Tij. In the context of transport phe- 
nomena or search on a network, MFPT is an important 
characteristic of random walks [3, 24 1. To compute it 



exactly, one usually considers node ? as a sink and uses 
the stochastic matrix W with the i 



column set to 
^° ). To com- 



pare the MFPTs, we used the coarse-graining shown in 
Fig. [IB, defining the sink node i as a single group. Fig. [2] 
shows with black circles (o) the average MFPT to node 
i for each group in the original network. The MFPT 
to the group consisting of node i in the coarse-grained 
network is displayed with red lines. The excellent over- 
lap indicates that the MFPT is extremely well preserved, 
whereas this is not the case in the network of clusters 
(see onsets in Fig. [2]). Hence the coarse-grained network 
is representative of the general features of the diffusion 
process in the initial network. Moreover this finding was 
shown to be robust if other eigenvectors are included, as 
long as the size of the intervals is kept small enough. In 
this respect the value / cx tunes the degree of pre- 
cision: increasing / improves the agreement between the 
initial and the coarse-grained network, but in the same 
time results in a larger N. 

In general the large eigenvalues and eigenvectors of 
W represent the large scale behavior of random walks. 
However, in some cases eigenvectors are directly asso- 
ciated with useful quantities. This is the case with the 
PageRank matrix 25].PageRank is defined via a stochas- 
tic process on the WWW, where for each step a "random 
surfer" either follows with probability d one of the exist- 
ing outgoing links or jumps at random to another site 
with probability (1 — d). The PageRank of the nodes 
corresponds to the stationary state of the process (i.e. to 
\p^))- Thus under coarse-graining the PageRank in the 
reduced network is the sum over the PageRank in the ini- 
tial network. The only effect of the directed nature of the 
WWW, compared to undirected network, is that nothing 
ensures that |p") is preserved when coarse-graining along 
for a > 1. 



As a second example, we consider the exit probability 
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optimized routines for sparse matrices. Therefore our 
method can be easily utihzed on large networks. 
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Francesco Rao and Amedeo Caflisch for a critical reading 
of the manuscript. This work was financially supported 
by COSIN (FET Open 1ST 200f-33555), DELIS (FET 
Open 001907) and the SER-Bern (02.0234). 



FIG. 2: (Color online) Comparison of the MFPT. The cir- 
cles (o) represent the average MFPT ranked for each group 
in the original network (variances are not shown since they 
are always smaller than the size of the circles). The MFPT of 
the corresponding nodes in the coarse-grained network is dis- 
played with red lines. A: di-alanine folding network with the 
sink i as the heaviest node of the red cluster . B: di-alanine 
folding network with the sink i as the heaviest node of the 
blue cluster. Onsets: Comparison of the MFPT between the 
original network (o) and the network of clusters (red line). 



on a network with two nodes (say node 1 and N) in 
which the random walk is trapped. One can show that 
the exit probability in node 1, resp. iV, starting at j can 
be expressed as the left eigenvector (wj | , resp. (u| | , of the 
stochastic matrix describing the transition probabilities 
and including the two traps. Hence coarse-graining the 
network according to the eigenvector {u^\, resp. is 
equivalent to preserving the exit probabilities in node 1, 
resp. N. In the case of a network describing the dynamics 
of a peptide, as the example studied in this Letter, the 
exit-probability can be associated with the p-fold [2^, 
defined as the probability to reach the native state before 
the denaturated state. If the two traps are chosen as 
representatives of the native and denaturated states (for 
instance the heaviest nodes of the two main clusters of 
Fig. [T]), our method allows to coarse-grain the network 
in such a way that the p-fold is perfectly preserved for 
every configuration (see 27] for a related coarse-graining 
framework considering a continuous diffusion process). 

In conclusion, we have defined a mathematical frame- 
work for coarse-graining complex networks based on ran- 
dom walks. This operation has the intrinsic property 
of preserving the first eigenvalues and the corresponding 
eigenvectors. In this sense it can be regarded as a decima- 
tion of the fast modes, without altering the slow modes, 
akin to fc-space coarse-graining, and eventually coming 
back to a real space coarse-grained network. Moreover 
we have shown for a network on which random walks 
have a physical interpretation that the coarse-graining 
provides a highly representative approximation of the ini- 
tial network, giving rise to a way to circumvent the large 
size of complex networks for their analysis and visualiza- 
tion. Finally, from a computational point of view, the 
first eigenvectors are fast to calculate with the existing 



[1] R. Albert and A.-L. Barabasi, Rev. Mod. Phys. 74, 47 
(2002). 

[2] M. E. J. Newman, SIAM Rev. 45, 167 (2003). 

[3] S. N. Dorogovtsev and J. F. F. Mendes, Advances in 
Physics 51, 1079 (2002). 

[4] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.- 
U. Hwang, Pysics Report 424, 175 (2006). 

[5] B. BoUobas, in Graph Theory and Combinatorics (Aca- 
demic Press, London, 1984), pp. 35-37. 

[6] I. Alvarez-Hamelin, L. Dall'Asta, A. Barrat, and 
A. Vespignani, cs.NI/0511007 (2005). 

[7] M. Girvan and M. E. J. Newman, PNAS 99, 7821 (2002). 

[8] J. Reichardt and S. Bornholdt, Phys. Rev. Lett. 93, 
218701 (2004). 

[9] L. Donetti and M. A. Munoz, J. Stat. Mech. p. 10012 

(2004) . 

[10] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, Nature 

435, 814 (2005). 
[11] R. Guimera and L. A. N. Amaral, Nature 433, 895 

(2005) . 

[12] M. E. J. Newman, Proc. Natl. Acad. Sci. USA 103, 8577 

(2006) . 

[13] A. Capocci, V. D. P. Servedio, G. Caldarelh, and F. Co- 
laiori, Lecture Notes in Computer Science 3243, 181 

(2004) . 

[14] C. Song, S. Havlin, and H. A. Makse, Nature 433, 392 

(2005) . 

[15] K.-I. Goh, G. Salvi, B. Kahng, and D. Kim, Phys. Rev. 

Lett. 96, 018701 (2006). 
[16] J. D. Noh and H. Rieger, Phys. Rev. Lett. 92, 118701 

(2004). 

[17] S. Lafon and A. B. Lee, IEEE Transactions on Pattern 
Analysis and Machine Intelligence 28, 1393 (2006). 

[18] M. Meila and J. Shi, in AI and Statistics (AISTATS) 
(2001). 

[19] D. Gfeller, P. De Los Rios, A. Caflisch, and F. Rao, Proc. 
Natl. Acad. Sci. 104, 1817 (2007). 

[20] F. Rao and A. Caflisch, J. Mol. Biol. 342, 299 (2004). 

[21] A. J. Enright, S. Van Dongen, and C. A. Ouzounis, Nu- 
cleic Acids Research 30, 1575 (2002). 

[22] P. Erdos and A. Renyi, Publ. Math. Debrecen 6, 290 
(1959). 

[23] A.-L. Barabasi and R. Albert, Science 286, 509 (1999). 
[24] A. Baronchelh and V. Loreto, Phys. Rev. E 73, 026103 

(2006) . 

[25] S. Brin and L. Page, in Proc. 7th International WWW 

Conference (1998), pp. 107-117. 
[26] R. Du, V. S. Pande, A. Y. Grosberg, T. Tanaka, and 

E. Shakhnovich, J. Chem. Phys. 108, 334 (1998). 
[27] Y. M. Rhee and V. S. Pande, J. Phys. Chem. B 109, 

6780 (2005). 



